使用深层生成模型从离线演示中提取策略原始的方法已显示出有望加速增强学习(RL)的新任务。直觉上,这些方法还应该有助于培训宣传员,因为它们可以执行有用的技能。但是,我们确定这些技术没有能力用于安全政策学习的能力,因为它们忽略了负面的经历(例如,不安全或不成功),只专注于积极的经验,这会损害他们安全地将新任务推广到新任务的能力。相反,我们将LettentsAfetyConteDlecting绘制在来自许多任务的演示数据集中,包括负面经验和积极经验,对litentsafetycontastect进行了原则性的对比培训。使用此较晚变量,我们的RL框架,安全技能先验(更安全)提取了特定于任务的安全原始技能,以安全,成功地将其推广到新任务。在推论阶段,接受培训的政策学会学会将安全技能纳入成功的政策。从理论上讲,我们描述了为什么更安全的行为能够实施安全的政策学习,并证明其在受游戏操作启发的几种复杂的至关重要的机器人握把任务上,在这种情况下,Saferoutperforms成功和安全方面的最先进的原始学习方法。
translated by 谷歌翻译
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile riskconstrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application.
translated by 谷歌翻译
As various city agencies and mobility operators navigate toward innovative mobility solutions, there is a need for strategic flexibility in well-timed investment decisions in the design and timing of mobility service regions, i.e. cast as "real options" (RO). This problem becomes increasingly challenging with multiple interacting RO in such investments. We propose a scalable machine learning based RO framework for multi-period sequential service region design & timing problem for mobility-on-demand services, framed as a Markov decision process with non-stationary stochastic variables. A value function approximation policy from literature uses multi-option least squares Monte Carlo simulation to get a policy value for a set of interdependent investment decisions as deferral options (CR policy). The goal is to determine the optimal selection and timing of a set of zones to include in a service region. However, prior work required explicit enumeration of all possible sequences of investments. To address the combinatorial complexity of such enumeration, we propose a new variant "deep" RO policy using an efficient recurrent neural network (RNN) based ML method (CR-RNN policy) to sample sequences to forego the need for enumeration, making network design & timing policy tractable for large scale implementation. Experiments on multiple service region scenarios in New York City (NYC) shows the proposed policy substantially reduces the overall computational cost (time reduction for RO evaluation of > 90% of total investment sequences is achieved), with zero to near-zero gap compared to the benchmark. A case study of sequential service region design for expansion of MoD services in Brooklyn, NYC show that using the CR-RNN policy to determine optimal RO investment strategy yields a similar performance (0.5% within CR policy value) with significantly reduced computation time (about 5.4 times faster).
translated by 谷歌翻译
A general, {\em rectangular} kernel matrix may be defined as $K_{ij} = \kappa(x_i,y_j)$ where $\kappa(x,y)$ is a kernel function and where $X=\{x_i\}_{i=1}^m$ and $Y=\{y_i\}_{i=1}^n$ are two sets of points. In this paper, we seek a low-rank approximation to a kernel matrix where the sets of points $X$ and $Y$ are large and are not well-separated (e.g., the points in $X$ and $Y$ may be ``intermingled''). Such rectangular kernel matrices may arise, for example, in Gaussian process regression where $X$ corresponds to the training data and $Y$ corresponds to the test data. In this case, the points are often high-dimensional. Since the point sets are large, we must exploit the fact that the matrix arises from a kernel function, and avoid forming the matrix, and thus ruling out most algebraic techniques. In particular, we seek methods that can scale linearly, i.e., with computational complexity $O(m)$ or $O(n)$ for a fixed accuracy or rank. The main idea in this paper is to {\em geometrically} select appropriate subsets of points to construct a low rank approximation. An analysis in this paper guides how this selection should be performed.
translated by 谷歌翻译
Transfer Learning is an area of statistics and machine learning research that seeks answers to the following question: how do we build successful learning algorithms when the data available for training our model is qualitatively different from the data we hope the model will perform well on? In this thesis, we focus on a specific area of Transfer Learning called label shift, also known as quantification. In quantification, the aforementioned discrepancy is isolated to a shift in the distribution of the response variable. In such a setting, accurately inferring the response variable's new distribution is both an important estimation task in its own right and a crucial step for ensuring that the learning algorithm can adapt to the new data. We make two contributions to this field. First, we present a new procedure called SELSE which estimates the shift in the response variable's distribution. Second, we prove that SELSE is semiparametric efficient among a large family of quantification algorithms, i.e., SELSE's normalized error has the smallest possible asymptotic variance matrix compared to any other algorithm in that family. This family includes nearly all existing algorithms, including ACC/PACC quantifiers and maximum likelihood based quantifiers such as EMQ and MLLS. Empirical experiments reveal that SELSE is competitive with, and in many cases outperforms, existing state-of-the-art quantification methods, and that this improvement is especially large when the number of test samples is far greater than the number of train samples.
translated by 谷歌翻译
人群顺序注释可能是一种有效且具有成本效益的方式,用于构建用于序列标签的大型数据集。不同于标记独立实例,对于人群顺序注释,标签序列的质量取决于注释者在捕获序列中每个令牌的内部依赖性方面的专业知识水平。在本文中,我们提出了与人群(SA-SLC)进行序列标记的序列注释。首先,开发了有条件的概率模型,以共同模拟顺序数据和注释者的专业知识,其中引入分类分布以估计每个注释者在捕获局部和非本地标签依赖性以进行顺序注释时的可靠性。为了加速所提出模型的边缘化,提出了有效的标签序列推理(VLSE)方法,以从人群顺序注释中得出有效的地面真相标签序列。 VLSE从令牌级别中得出了可能的地面真相标签,并在标签序列解码的正向推断中进一步介绍了李子标签。 VLSE减少了候选标签序列的数量,并提高了可能的地面真实标签序列的质量。自然语言处理的几个序列标记任务的实验结果显示了所提出的模型的有效性。
translated by 谷歌翻译
局部结构化输出学习的现有歧义策略不能很好地概括地解决有些候选人可能是假阳性或与地面真相标签相似的问题。在本文中,我们提出了针对部分结构化输出学习(WD-PSL)的新型弱歧义。首先,分段较大的边距公式被推广到部分结构化输出学习,该学习有效地避免处理大量的复杂结构候选结构化输出。其次,在拟议的弱歧义策略中,每个候选标签都具有一个置信值,表明其真实标签的可能性是多大的,该标签旨在减少学习过程中错误地面真相标签分配的负面影响。然后配制了两个大边缘,以结合两种类型的约束,这是候选人和非候选者之间的歧义,以及候选人的弱歧义。在交替优化的框架中,开发了一种新的2N-SLACK变量切割平面算法,以加速每种优化的迭代。自然语言处理的几个序列标记任务的实验结果显示了所提出的模型的有效性。
translated by 谷歌翻译
现有的部分序列标记模型主要集中在最大边缘框架上,该框架未能提供对预测的不确定性估计。此外,这些模型采用的独特地面真理歧义策略可能包括用于参数学习的错误标签信息。在本文中,我们提出了部分序列标签(SGPPSL)的结构化高斯过程,该过程编码了预测中的不确定性,并且不需要额外的努力来选择模型选择和超参数学习。该模型采用因子式近似,将线性链图结构划分为一组,从而保留了基本的马尔可夫随机场结构,并有效地避免处理由部分注释数据生成的大量候选输出序列。然后在模型中引入了置信度度量,以解决候选标签的不同贡献,这使得能够在参数学习中使用地面真相标签信息。基于所提出模型的变异下限的派生下限,在交替优化的框架中估计了变分参数和置信度度量。此外,提出了加权viterbi算法将置信度度量纳入序列预测,该预测考虑了训练数据中的多个注释,从而考虑了标签歧义,从而有助于提高性能。 SGPPSL在几个序列标记任务上进行了评估,实验结果显示了所提出的模型的有效性。
translated by 谷歌翻译
最近,由于社交媒体数字取证中的安全性和隐私问题,DeepFake引起了广泛的公众关注。随着互联网上广泛传播的深层视频变得越来越现实,传统的检测技术未能区分真实和假货。大多数现有的深度学习方法主要集中于使用卷积神经网络作为骨干的局部特征和面部图像中的关系。但是,本地特征和关系不足以用于模型培训,无法学习足够的一般信息以进行深层检测。因此,现有的DeepFake检测方法已达到瓶颈,以进一步改善检测性能。为了解决这个问题,我们提出了一个深度卷积变压器,以在本地和全球范围内纳入决定性图像。具体而言,我们应用卷积池和重新注意事项来丰富提取的特征并增强功效。此外,我们在模型训练中采用了几乎没有讨论的图像关键框架来改进性能,并可视化由视频压缩引起的密钥和正常图像帧之间的特征数量差距。我们最终通过在几个DeepFake基准数据集上进行了广泛的实验来说明可传递性。所提出的解决方案在内部和跨数据库实验上始终优于几个最先进的基线。
translated by 谷歌翻译
该手稿解决了预测出院后全因住院再入院或死亡的同时问题,并量化放电放置在防止这些不良事件中的影响。为此,我们开发了一个固有的可解释的多级贝叶斯建模框架,该框架灵感来自重新激活的深神经网络的分段线性。在生存模型中,我们明确调整了混淆,以量化局部平均治疗效果以进行放电的干预措施。从2008年和2011年开始,我们对5%的Medicare受益人样本进行了培训,然后在2012年的索赔中测试了该模型。该模型对30天全因素外的再选中(使用官方CMS方法定义)的分类精度进行了评估,该模型对XGBoost,Logistic回归(功能工程后)和对同一数据进行训练的贝叶斯深神经网络的执行方式相似。该模型对30天的分类任务进行了预测的30天分类任务,该任务是使用剩下的未来数据进行测试,该模型的AUROC约为0.76,AUPRC约为0.50(相对于测试数据中的总体阳性速率),AUPRC的AUPRC达到了约0.76,而AUPRC的AUPRC则达到了AUPRC,则获得了AUPRC。证明人们不需要为准确性而牺牲可解释性。此外,该模型的测试AUROC为0.78,分类为90天全因素外再入院或死亡。我们很容易地凝视着我们固有的可解释模型,总结了其主要发现。此外,我们演示了Black-box Perthoc解释器工具的形状如何生成不受拟合模型支持的解释 - 如果以面值为单位,则没有提供足够的上下文来使模型可操作。
translated by 谷歌翻译